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THE HIGH COST OF TESTING 



SIDNEY L. PRESSEY 
Indiana University 



The other day the writer was asked for a list of group scales 
of intelligence available for use in the public schools. Five minutes 
spent in mentally reviewing recent work resulted in a list of twenty- 
seven such tests, and this list is probably not complete. Every one 
of these scales has been developed within the last four years. The 
length of this list is interesting evidence of the tremendous activity 
in the field of testing at present. This activity is not confined to 
work with tests of intelligence, however. A number of examina- 
tions have appeared composed of tests of achievement in the differ- 
ent subjects, these examinations making possible comprehensive 
surveys covering the major portion of the work in a given grade 
or grades. Besides all these new ventures, there still exist and 
multiply the tests in arithmetic, spelling, reading, etc., which 
occupied the entire field a few years ago. The progressive superin- 
tendent, interested in educational measurement, assuredly cannot 
complain of any lack of materials. Rather, he is bewildered by 
the multiplicity of the offerings. 

The fact is, there is a "boom" in tests. The wise school official 
will try, in such circumstances, to discriminate the good from the 
bad, and will neither be swept off his feet by the present enthusiasm 
nor discredit the whole movement because some of the work 
is injudicious. However, a superintendent or principal is often at 
a loss to know how to judge the various tests and scales which are 
brought to his attention. The present paper is simply an effort 
to present certain very simple ways in which such a school official 
may, from inspection of sample materials, and before actually 
investing in the materials for testing, form his own opinion as to 
their usefulness. The writer will not specifically mention any 
tests, but will rather consider certain general principles to be 
observed in evaluating test materials. The reader will find it 
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easy enough to apply the few practical criteria suggested to any 
particular scales he may be considering. 

The points considered in this paper are only three in number, 
and are of an altogether practical nature. They have to do chiefly 
with what may be called the "mechanics " of a test. Many workers 
in the field look upon these matters as of little importance. There 
are other things, to be sure, about a test that a superintendent 
would do well to know before using it. But these three questions 
on the practical phases of testing should, the writer believes, be 
carefully considered first. Many tests will thus be rejected at 
once as impracticable, and final choice, on the basis of more funda- 
mental criteria, will be made easier. The writer would urge, then, 
that in selecting tests for use in his schools the superintendent 
should determine the following facts concerning the tests under 
consideration: (i) How much will the materials cost? (2) Are 
the tests easily given? (3) Are the tests easily scored? These 
three questions will be taken up briefly in order. The intention 
will be, in each instance, to suggest ways in which a superintend- 
ent may determine, before actually using a given scale or test, cer- 
tain practical merits or faults of the material. 

1 . How much will the materials cost ? — This is the most elemen- 
tary question of them all, and the one on which a hard-headed 
superintendent is least likely to make a mistake. But he occa- 
sionally does. The most important point to notice is just how 
much accessory material accompanies the test blank, and whether 
or not these accessory materials involve an extra charge. Tests 
differ a great deal in this respect. The writer has in mind a well- 
known test in a fundamental school subject in which the listed 
price of the blanks includes (a) the test folders on which the 
children work, (b) the score cards, (c) an adequate number of 
direction sheets, (d) a booklet of general directions — for tabulating, 
finding medians, etc., (e) a larger booklet containing instructions 
for comparing schools or classes, (/) a record sheet, and (g) a sheet 
for recording repeated use of the test. This makes up, in the 
aggregate, a confusing amount of material. But it is all covered 
in the initial cost of the blanks. In contrast with this is a certain 
group scale of intelligence in which the primary cost covers only 
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the booklets used by the pupils. But in order to use these blanks 
the superintendent must buy also (a) booklets of directions for the 
examiner (of about sixty pages) and (b) score cards with scoring 
directions. The result is that the total expense of the blanks for a 
class is a fourth more than would be expected from the cost of the 
blanks as quoted. This latter arrangement (extra charge for direc- 
tions and materials for scoring and recording) is the more com- 
mon method of price quotation at present. 

The superintendent should also consider the expense of ship- 
ment. In some instances, the shipping expenses are prepaid; 
more often they are not. And since certain examinations require 
for each child a booklet which may contain as many as sixteen 
pages, with an examiner's booklet mounting up to fifty pages or 
more, besides scoring and tabulating materials, it is well to deter- 
mine in advance what the cost of shipping may be. The writer 
has found that small orders — under two hundred — require postage 
or express amounting to more than 15 per cent of the cost of the 
materials ordered. For large orders, the cost of shipment usually 
runs to at least 10 per cent of the cost of the materials. 

In case the tests are to be scored by a paid worker instead of 
by the teachers, the superintendent should add to the above- 
mentioned items the cost of scoring. There is only one way to 
determine the time and cost of scoring — have some of the blanks 
scored. It is possible to form only a very general judgment from 
mere inspection of a blank as to the length of time needed to score 
the blank. And any estimates sent out by the originators of the 
tests or by commercial houses handling them are almost sure to 
be considerably too low. 1 

2. Are the tests easily given? — When the "directions" for the 
use of a scale extend over a booklet of twenty to thirty pages, it is 
evident that the teacher must make a somewhat elaborate prepa- 
ration if she is to handle the tests with any degree of accuracy. 
She will need to spend a half or three-quarters of an hour merely 

1 The writer confesses to being a guilty party in this connection. Each year he 
has students who make it their business to score blanks rapidly and efficiently. As a 
consequence, estimates based upon their performance suggest a burden of scoring 
considerably less than may turn out to be the case with less skilled labor. 
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to read over the directions; and really to absorb them so that the 
tests will function smoothly will take two or three hours more. 
Such close study will be required even if the procedure is fairly 
simple and straightforward. Procedure in many instances is 
astoundingly elaborate and intricate. 

The writer has before him an examination for children in the 
first three grades. The directions to the children cover fourteen 
pages in the "Manual of Directions"; the general directions to 
the examiner and the directions for scoring cover twelve pages 
more. In order to give this examination these fourteen pages 
must be practically memorized; the manual advises that the 
directions should not be read, but should be given from notes 
made on a copy of the blank used by the children, after "careful 
study" of the printed directions. That the examiner must indeed 
have a mastery of the verbal directions may be realized when 
the number of other things this person must do is fully under- 
stood. Not only must he give the directions to the pupils. He 
must illustrate certain tests on the board, occasionally using red, 
green, and blue chalk. He must point out certain places in the 
test blank to the children, watching to "make sure that pupils 
on his extreme right and left, as well as those in the rear rows, see 
where he is pointing." He must show the pupils "how to lift the 
forearm only, leaving the elbow on the desk to avoid fatigue" in 
raising the pencil between tests, and should "see that all children 
have their pencils up." He must do this, in fact, forty times 
(something of a job, with a restless first-grade class!). He must 
also watch for a hand raised to indicate a broken pencil, and 
supply a new pencil at once. He must observe closely the progress 
of the class, on most tests, so as to stop work "when from hah to 
three-quarters of the class have finished." Yet we are told that 
"the series can all be given by the classroom teacher." 

The amount of time necessary to prepare to use some of these 
group scales is enormous, and the interest of teachers in such work 
is being put to an impossible strain. There is a strong probability 
of a definite reaction against the measurement movement as 
impracticable, if the belief becomes prevalent that such burden- 
some methods are essential. As a matter of fact, they are not 
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essential. Such intricate procedure is evidence of a lack of care 
on the part of the originator in arranging his tests so that they can 
be given readily, and of a lack of consideration for the teacher. A 
good test should practically give itself. It is common among psy- 
chologists to hear the stupidity of the teachers, and their total 
inability to give tests in standard form, much berated. In reality, 
the stupidity is on the other side. If it is intended that a test 
be given by the teachers, then the burden is upon the originator 
to fit the test for such giving. If the test cannot be given by the 
average teacher, the originator has failed to do his work as he 
should. 

2,. Are the tests easily scored? — However, after the test or scale 
has been given, the business is by no means over with; there is 
still the scoring. It is coming to be generally agreed that a good 
test should not require written answers; this elimination of written 
responses greatly simplifies the scoring problem. However, 
examples of burdensome scoring methods are still appearing. 

Thus, the examination for the first grades already referred to 
requires in one place that the pupil draw a boy running. The 
scorer is told to score one point "for a man to consist of a trunk, 
two arms, two legs, and a head; one point more if he registers 
running." In another place one point is given for "a square 
around or near enough to the watch to show that the child identifies 
the article." For another item, the scorer is confronted with the 
following directions: 

PROBLEM 13 — TOTAL CREDIT 6 

I for 6 pennies marked, but not more than 6. 

1 for 4 quarters marked (but not more than 4), with either a "2" or "5" or 

"25-" 
1 for writing correctly any number of pennies made of correct size, provided 
the number be not more than 6 or less than 2. 

1 for writing correctly any number of quarters made of correct size, provided 

the number be 2, 3, or 4. 

2 for indicating in any number language so that the examiner will know that 

the child has the idea, $1 .06 (combined sum). 

There is no excuse for tests requiring any such scoring direc- 
tions. No teacher should be expected to score such tests. 
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Again, a test may be fairly easily scored and still involve a 
great deal of work before a final statement can be arrived at. 
Thus, in a recent examination there are fifteen tests. In four of 
these, the children are to answer by underlining a word; in two, 
they cross out a word; in nine, they write a number. This is not 
as systematic as it might be. But the real difficulties have not 
yet been reached. After one has counted the number of words 
correctly crossed out or underlined or the number of figures cor- 
rectly written, the final score is still to be determined. In one 
test the score is "the number of exercises right minus the number 
wrong, and in case the difference is negative, call it zero." In 
another it is "the number of spaces correctly filled in, divided by 
four; if a pupil has all 150 spaces correctly filled in, he receives 
the highest possible score, which is 150 divided by 4, or 375." In 
three other tests, the correct score is, for the purpose of combining 
in the final rating, to be multiplied by 2. Again, a "rate" score 
is obtained by taking " the number that the pupil has marked as 
indicating the line he was reading when time was called and 
dividing this number by 4." The number of exercises done cor- 
rectly makes up a "comprehension" score. However, on this test, 
the end is not yet. The scorer is told to "note that in the case of 
Examination II, 4 is to be added to the 'comprehension' score 
and 15 to the 'rate' score before the scores are transmuted into 
quotients." All the scores must now be transcribed from the 
fifteen inside pages, eight of which are upside down, to the front 
of the booklet, and then be added to give a "total point score." 

However, this is not the final form of statement advised. The 

scorer now adds "43 to the point score and divides by 10; the 

quotient is the mental age." But a percentage statement is more 

desirable, so the scorer divides the result of this last operation by 

the child's chronological age — that is, this is done if the child is 

not over 14; if he is, the scorer "adds 43 to the pupil's point score 

and divides this by the average score for his chronological age, 

plus 43." But for ages above 18, the point score is the same as 

for 18. 1 

1 It is only fair to say that tables are provided for facilitating some of these com- 
putations. The tables cut down the labor somewhat but hardly reduce the complexity 
of the situation. 
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All the calculations described in the last paragraph are for but 
one-half of the examination. The scorer now goes through much 
the same process for the other half and, by a comparison of the 
halves, obtains some idea of the relation of the child's achievement 
in school work to his ability to do such work. But the person who 
is scoring the examination, presumably the teacher, is not through 
with it all yet; the results must be recorded on the class-record 
sheet — which is no small task. And yet the deviser of this scale 
speaks of his method of handling data as unusually simple! 

As has been said at the beginning of this paper, there is now a 
"boom" in tests, and this "boom" may be regarded as an exag- 
gerated evidence of a growth that is fundamentally healthy. But 
there is a real danger in some of the elaborate examinations which 
have recently appeared. The measurements movement can sur- 
vive only if it renders a real service to the schools. It is fast com- 
ing to be largely a burden. As a matter of fact, there is nothing 
mysterious about a test; nor is there any magic in statistics. 
There is no reason why teachers and superintendents should not 
feel themselves competent to pass on the materials and methods 
offered them. If the tests are not convenient, or the statistics are 
overinvolved, these materials may well be rejected. A test is 
fundamentally an attempt to get at facts in a more direct way than 
would be otherwise possible; and statistical procedure is but a 
method of making those facts more readily available. The needs 
and convenience of superintendents and teachers should receive 
more consideration by those engaged in constructing tests. It is 
high time that this should be recognized, and that practical require- 
ments should be taken into account. 



